AO-All 5  902  DAVID  W  TAYLOR  NAVAL  SHIP  RESEARCH  AND  DEVELOPMENT  CE— ETC  F/6  9/5 
INFORMATION  SYSTEMS  DESIGN  METHODOLOGY:  OVERVIEW. (U) 

MAY  82  D  K  JEFFERSON 

UNCLASSIFIED  DTNSROC-82/043  NL 


p 


UNCLASSIFIED 


IECU.'IITy  CLASSIFICATION  of  This  PAGE  (H7i»n  Data  Entered) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1  REPOBT  NUMBER  |2  OflVT  ACCJfiSip^  ^ 

#3^RECJPlJ||T*S  catalog  number 

DTNSRDC-82/043  (" 

yOSh 

4  TITLE  ( and  Subtitle) 

5  TYPE  of  REPORT  ft  PERIOD  COVERED 

Final  Report 

INFORMATION  SYSTEMS  DESIGN  METHODOLOGY: 

June  1981-May  1982 

OVERVIEW 

6  PERFORMING  QRG.  REPORT  NUMBER 

7.  authors; 

6.  CONTRACT  OR  GRANT  NUMBERf a) 

David  K.  Jefferson 

9.  PERFORMING  ORGANIZ  ATION  NAME  AND  ADDRESS 

10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  0  WORK  UNIT  NUMBERS 

David  W.  Taylor  Naval  Ship  Research 

and  Development  Center 

Bethesda,  Maryland  20084 

(See  reverse  side) 

M.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

12  REPORT  DATE 

Naval  Supply  Systems  Command 

May  1982 

Research  and  Technology  Division 

13  NUMBER  OF  PAGES 

Washington,  D.C.  20376 

62 

U  MON.TORING  AGENCY  NAME  »  ADDRESSfU  different  fro m  Controlling  Office) 

15  SECURITY  CLASS,  (of  thla  report) 

UNCLASSIFIED 

15a  DECLASSIFICATION  DOWNGRADING 
SCHEDULE 

16.  distribution  statement  'of  this  R«ro rij 


APPROVED  FOR  PUBLIC  RELEASE:  DISTRIBUTION  UNLIMITED 


17.  DISTRIBUTION  STATEMENT  (of  the  abstract  entered  In  Block  20,  if  different  from  Report) 


10.  SUPPLEMENTARY  notes 


19.  KEY  WORDS  (Continue  on  reverae  aide  If  neceeamry  and  Identify  by  block  number) 

Information  Systems  Design  Logistics  Systems 

Requirements  Analysis 
Data  Base  Design 

Problem  Statement  Language/Problem  Statement  Analyzer 


20  ABSTRACT  (Continue  on  reverae  aide  If  neceaaary  and  Identify  by  block  number) 

This  technical  report  briefly  describes  a  six-phase  methodology  for 
designing  an  information  system:  formulation  of  the  system  outline, 
analysis  of  requirements,  design  of  the  global  logical  data  base,  defi¬ 
nition  of  data  base  processes,  design  of  the  physical  data  base,  and  simu¬ 
lation  of  data  base  operations.  The  methodology  is  based  on  the  extensive 

(Continued  on  reverse  side) 


DO  ,  j an*73  1473  EDITION  OF  1  NOV  65  IS  OBSOLETE 

S'N  0102-LF-014-6601 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  fPh»n  Oaf*  Entered) 


r 


_ UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Wt lan  Dmtt  Entmtmd) 

(Block  10) 

Program  Element  62760N 
Task  Area  TF60531100 
Work  Unit  1821-009 

(Block  20  continued) 

use  of  computer-aided  design  tools,  including  the  Problem  Statement 
Language/Problem  Statement  Analyzer  (PSL/PSA) .  The  development  and 
application  of  the  methodology  to  a  very  large  design  effort  are 
described;  numerous  actual  problems  are  described  to  demonstrate  the 
need  for  the  methodology. 


Acccr-.ton  ^ov 


_ UNCLASSIFIED _ 

SeCURITY  CLASSIFICATION  OF  THIS  PAGEfWAan  Data  Enfrtd) 


TABLE  OF  CONTENTS 


Page 

LIST  OF  FIGURES .  iv 

LIST  OF  ABBREVIATIONS .  vi 

ABSTRACT .  1 

ADMINISTRATIVE  INFORMATION .  1 

INTRODUCTION .  3 

BACKGROUND  ON  THE  ICP  SYSTEM .  3 

BACKGROUND  ON  ISDNLS .  4 

SYSTEMS  DESIGN  FOR  LOOSELY  INTEGRATED  SYSTEMS . 7 

FORMULATION  OF  SYSTEM  CONCEPTS  AND  GUIDELINES .  7 

ANALYSIS  OF  REQUIREMENTS .  7 

FORMULATION  OF  DETAILED  DESIGN .  8 

CRITIQUE .  8 

PHASE  0.  FORMULATE  SYSTEM  OUTLINE .  9 

SYSTEM  CONCEPTS  AND  GUIDELINES .  9 

CONCEPTUAL  DATA  STRUCTURE .  9 

FUNCTIONAL  REQUIREMENTS  FOR  THE  SYSTEM .  II 

PHASE  1.  ANALYZE  REQUIREMENTS .  13 

WRITE  REQUIREMENTS  STATEMENTS .  13 

DRAW  FUNCTIONAL  DIAGRAMS..... .  15 

MECHANIZE  REQUIREMENTS  STATEMENTS .  15 

CRITIQUE .  18 

PHASE  2.  DESIGN  GLOBAL  LOGICAL  DATA  BASE .  25 

DESIGN  LOGICAL  DATA  STRUCTURE  FOR  SUBSYSTEMS .  25 

DESIGN  LOGICAL  DATA  STRUCTURE  FOR  THE  SYSTEM . . .  25 

ACTUAL  DESIGN  OF  THE  GLOBAL  LOGICAL  DATA  BASE .  26 

DETERMINE  ENTITIES  AND  RELATIONSHIPS .  27 

DEFINE  BOXES  AND  LINES .  28 

REVIEW  AND  REVISE  GLDB  STRUCTURE .  31 

CRITIQUE .  34 

iii 


Page 


PHASE  3.  DEFINE  DATA  BASE  PROCESSES .  37 

MAP  THE  RS  DATA  INTO  THE  GLDB  STRUCTURE .  37 

SPLIT  OR  COMBINE  RS  PROCESSES .  37 

ASSIGN  DATA  TO  FILE  OR  DATA  BASE . . .  38 

REVIEW  RS  PROCESSES .  38 

DIAGRAM  DATA  BASE  PROCESSES .  39 

REPRESENT  DATA  BASE  WORKLOAD  IN  PSL/PSA .  39 

REVIEW  AND  REVISE  GLDB  STRUCTURE .  43 

PHASE  4.  DESIGN  PHYSICAL  DATA  BASE .  47 

SIMPLIFY  DATA  BASE  STRUCTURE  AND  PROCESSES .  47 

DETERMINE  SIZES,  VOLUMES,  AND  VOLATILITIES .  48 

FORM  CANONICAL  RECORDS .  48 

CONVERT  TO  PHYSICAL  LEVEL .  48 

DESIGN  PHYSICAL  STRUCTURE .  49 

ANALYZE  RESULTS  AND  ITERATE .  49 

PHASE  5.  SIMULATE  DATA  BASE  OPERATION .  51 

PHASE  6.  DESIGN  OPERATIONAL  SUBSYSTEMS .  53 

CONCLUSIONS .  54 

REFERENCES .  55 


LIST  OF  FIGURES 

1  -  Simplified  Example  of  a  Conceptual  Data  Structure .  10 

2  -  Example  of  RS  Structure . 14 

3  -  Example  of  an  Overview  Diagram . 15 

4  -  Example  of  a  Detailed  Diagram. . . . 16 

5  -  Example  of  a  Data  Collection . 17 

6  -  Example  of  an  RS  Summary . 19 

7  -  Example  of  a  Consistency  Check... . . 20 


iv 


IT 


Page 


8  -  Example  of  a  Completeness  Check . .  21 

9  -  Example  of  an  RS  Overview. . . . . . .  22 

10  -  Example  of  an  RS  Data  Report .  23 

11  -  Example  of  RS  Process  Structure. ......  . .  23 

12  -  Example  of  RS  Process  Detail..... . . .  24 

13  -  Example  of  an  Initial  List  of  Entities . 27 

14  -  Example  of  a  Diagram  of  Entities  and  Relationships .  28 

15  -  Example  of  GLDB  Structure . . . 32 

16  -  Example  of  Entity  Detail . .  33 

17  -  Example  of  Relationship  Detail . . .  34 

18  -  Example  of  a  GLDB  Diagram . .  35 

19  -  Example  of  a  Diagram  of  a  Data  Base  Process . . .  40 

20  -  Example  of  Workload  Structure . 41 

21  -  Generalized  Example  of  a  Diagram  of  a  Data  Base  Process .  42 

22  -  Generalized  Example  of  Workload  Detail*.. . 43 

23  -  Example  of  Workload  Detail . . . . . . .  44 

24  -  Continuation  of  Workload  Detail . 45 


v 


LIST  OF  ABBREVIATIONS 


ASO  - 
DEN  - 
DTNSRDC 
ECSS  - 
FEDSIM  - 
GLDB  - 
ICP  - 
ISDNLS  - 
ISDOS  - 
NAVSUP  - 
PSA  - 
PSL  - 
RS  - 


Aviation  Supply  Office 
Data  Element  Number 

David  W.  Taylor  Naval  Ship  Research  and  Development  Center 
Extendable  Computer  System  Simulator 

Federal  Computer  Performance  Evaluation  and  Simulation  Center 
Global  Logical  Data  Base 
Inventory  Control  Point 

Information  Systems  Design  for  Navy  Logistics  Systems 

Information  Systems  Design  and  Optimization  System 

Naval  Supply  Systems  Command 

Problem  Statement  Analyzer 

Problem  Statement  Language 

Requirements  Statement 


vi 


ABSTRACT 


This  technical  report  briefly  describes  a  six-phase  method¬ 
ology  for  designing  an  information  system:  formulation  of  the 
system  outline,  analysis  of  requirements,  design  of  the  global 
logical  data  base,  definition  of  data  base  processes,  design  of 
the  physical  data  base,  and  simulation  of  data  base  operations. 

The  methodology  is  based  on  the  extensive  use  of  computer-aided 
design  tools,  including  the  Problem  Statement  Language /Problem 
Statement  Analyzer  (PSL/PSA).  The  development  and  application 
of  the  methodology  to  a  very  large  design  effort  are  described; 
numerous  actual  problems  are  described  to  demonstrate  the  need 
for  the  methodology. 

ADMINISTRATIVE  INFORMATION 

This  work  was  performed  in  the  Computer  Sciences  and  Information  Systems  Divi¬ 
sion  of  the  Computation,  Mathematics,  and  Logistics  Department  under  the  sponsorship 
of  NAVSUP  033,  Task  Area  TF60531100,  Work  Unit  1821-009. 
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INTRODUCTION 

This  report  describes  a  methodology  for  designing  very  large,  complex  informa¬ 
tion  systems.  The  methodology  is  a  research  product  of  the  Information  Systems 
Design  for  Navy  Logistics  Systems  (ISDNLS)  Research  Project  and  was  tested  and 
modified  in  the  design  of  a  new  Inventory  Control  Point  (ICP)  System  for  the  Naval 
Supply  Systems  Command  (NAVSUP) .  Problems  encountered  in  this  design  effort  are 
described.  Steps  which  should  have  been  taken  to  avoid  the  problems,  and  the  steps 
which  were  actually  taken  to  compensate  for  the  problems,  are  briefly  discussed. 
Practicality  is  emphasized. 

The  remainder  of  this  section  provides  a  brief  background  on  the  ICP  System  and 
on  the  ISDNLS  Research  Project.  The  next  section  outlines  the  systems  design  for  a 
loosely  integrated  system,  in  order  to  explain  the  need  for  additional  complexity  of 
methodology  in  the  design  of  a  highly  Integrated  system.  Additional  sections  then 
describe  each  of  the  phases  of  the  methodology:  what  should  be  done,  what  was  done, 
what  went  wrong,  and  what  was  done  to  make  things  right.  A  detailed  technical 
report  will  be  available  in  the  future  on  Phase  2,  Design  Global  Logical  Data  Base. 

BACKGROUND  ON  THE  ICP  SYSTEM 

NAVSUP's  ICP  System  is  a  very  large  and  complex  system  with  a  scope  well  beyond 
what  is  usually  considered  inventory  control.  The  ICP  System  is  responsible  not  only 
for  some  800,000  supply  items  and  $1.5  billion  in  purchases  per  year,  but  is  also 
heavily  involved  in  program  planning,  configuration  management,  maintenance,  repair, 
technical  documentation,  and  many  other  applications  involving  supply  items.  The 
current  system  processes  50,000  transactions  a  day,  has  about  5  billion  characters 
of  on-line  storage,  and  is  based  on  about  5  million  lines  of  COBOL  code  and  an 
in-house  file  management  system.  This  system  is  now  about  15  years  old  and  its 
hardware  limitations  prohibit  the  development  of  important  new  applications. 
Maintenance  of  both  hardware  and  software  has  become  very  difficult  and  expensive, 
and  both  hardware  and  software  are  considerably  behind  the  state  of  the  art. 
Accordingly,  a  new  ICP  System  is  being  acquired.  The  new  system  will  have  at  least 
10  billion  characters  of  on-line  storage  and  will  include  132  major  applications. 

A  new  global  logical  data  base  has  been  designed,  as  described  in  "Design  Global 
Logical  Data  Base."  It  includes  about  2,000  standardized  Data  Element  Numbers 


(DEN's),  about  6,000  application-oriented  data  elements  which  must  eventually  be 
standardized,  14  major  data  collections,  210  minor  data  collections,  and  260  rela¬ 
tionships  among  data  collections.  The  new  data  base  design  is  based  on  Require¬ 
ments  Statements  (RS's)  of  doubtful  validity,  as  discussed  in  "Analyze  Requirements;" 
the  logical  design  will  therefore  not  be  the  basis  for  a  physical  design. 

BACKGROUND  ON  ISDNLS 

Recognition  of  the  need  for  a  new  ICP  System  led  to  the  formation  in  July  1974 
of  the  Information  Systems  Design  for  Navy  Logistics  Systems  (ISDNLS)  Research  Pro¬ 
ject.  The  objectives  of  this  research  project  were  to  determine  the  state  of  the  art 
in  information  systems  design  and  implementation,  to  determine  the  major  weaknesses 
of  the  ICP  System,  and  to  develop  the  technology  needed  to  eliminate  those  weaknesses 
in  a  new  system.  Initial  efforts  were  devoted  primarily  to  issues  of  implementation 
and  performance:  data  base  design,  distributed  processing,  and  mass  storage  systems. 

A  fourth  issue,  systems  analysis  and  design,  was  soon  recognized  as  even  more 
critical;  the  ICP  System  required  complete  redesign  rather  than  isolated  improve¬ 
ments.  The  system  was  so  large  and  complex  and  had  evolved  over  such  a  long  period 
of  time  that  modification  of  one  part  would  have  had  tremendous  impact  on  the  entire 
system.  Furthermore,  the  scope  of  the  system  was  expanding  far  beyond  the  item- 
oriented  applications  for  which  it  was  originally  designed.  Accordingly,  a  major 
effort  was  devoted  to  systems  analysis  and  design,  in  general,  and  to  requirements 

analysis  in  particular.  The  Problem  Statement  Language/Problem  Statement  Analyzer 
12* 

(PSL/PSA)  ’  was  acquired  from  the  ISDOS  (Information  Systems  Design  and  Optimiza¬ 
tion  System)  Project  at  the  University  of  Michigan.  PSL  is  a  language  for  describing 
information  systems;  PSA  is  a  computer  program  which  accepts  PSL  statements,  analyzes 
and  stores  them  in  a  data  base,  and  produces  analyses  and  documentation  on  demand. 
PSL/PSA  will  be  discussed  in  more  detail  in  a  later  section,  but  the  need  for  this 
tool  should  be  immediately  obvious:  the  requirements  and  designs  for  the  new  ICP 
System  are  simply  too  large  and  complex  to  be  managed  without  computer  assistance. 

The  methodology  described  in  this  report  is  still  based  on  PSL/PSA,  which  provides 
the  complete,  consistent,  up-to-date  requirements  that  are  absolutely  essential  to 


*A  complete  listing  of  references  is  given  on  page  55. 
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the  development  of  a  good  information  system.  The  methodology  extends  backward  to 
the  work  which  must  be  done  before  requirements  analysis  and  forward  to  the  work 
which  is  based  on  It.  The  following  section  outlines  the  need  for  the  methodology. 

Personnel  from  the  ISDNLS  Research  Project  were  heavily  involved  from  1978  to 
1982  in  two  aspects  of  the  ICP  System  redesign:  training  NAVSUP  personnel  in  the 
use  of  the  information  systems  design  methodology,  and  developing  and  maintaining 
the  PSA  data  bases  of  design  information.  The  problems  which  have  been  encountered, 
and  which  have  caused  major  changes  in  the  methodology,  have  generally  been  due 
much  more  to  lack  of  foresight  on  the  part  of  the  author  than  to  the  many  NAVSUP 
analysts  who  have  been  applying  the  methodology. 
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SYSTEMS  DESIGN  FOR  LOOSELY  INTEGRATED  SYSTEMS 
Systems  design  for  loosely  integrated  systems,  such  as  those  based  on  file 
management  systems,  is  accomplished  by  a  fairly  simple  progression  through  three 
phases:  l)  Formulation  of  System  Concepts  and  Guidelines,  2)  Analysis  of  Require¬ 

ments,  and  3)  Detailed  Design.  Briefly,  these  phases  establish  why  a  system  is 
needed,  what  the  system  will  do,  and  how  the  system  will  do  it.  Revisions  may  be 
necessitated  in  the  output  of  one  phase  to  reflect  the  results  of  later  phases 
(e.g.,  detailed  design  may  indicate  that  a  requirement  cannot  be  met  with  the  given 
resources),  but  such  revisions  should  be  rare. 


FORMULATION  OF  SYSTEM  CONCEPTS  AND  GUIDELINES 

The  result  of  this  phase  is  a  document  describing  the  goals  of  the  system  and 
the  boundaries  on  both  development  and  operation.  The  document  should  address 
organizational  goals,  structure,  and  resources,  and  the  system's  relation  to  then. 
The  system  goals  may  include  new  applications,  more  timely  production  of  current 
reports,  increased  security,  and  enhanced  reliability.  The  description  of  the  orga¬ 
nizational  structure  should  include  both  formal  and  informal  lines  of  communication, 
the  processes  performed  and  their  location,  and  the  changes  that  can  and  cannot  be 
imposed  on  the  organizational  structure  to  accommodate  the  new  system.  The  descrip¬ 
tion  of  available  resources  should  include  not  only  time,  money,  and  personnel,  but 
also  any  applicable  restrictions  on  the  amount  and  type  of  training  to  be  required 
for  system  users,  and  requirements  that  existing  hardware  and  software  resources  be 
used  (or  that  they  not  be  used).  Also,  any  known  or  anticipated  changes  to  be  made 
in  the  organization's  goals,  structure,  and  resources  should  be  specified  as  pre¬ 
cisely  as  possible.  This  phase  is  clearly  critical  to  the  design,  development,  and 
acceptance  of  the  system.  It  is  also  clear  that  top  management  must  provide  the 
major  input  to  this  document,  must  understand  the  document  and  its  implications,  and 
must  ensure  adherence  to  the  document  by  system  developers  and  users. 

ANALYSIS  OF  REQUIREMENTS 

The  result  of  this  phase  is  a  document  describing  what  the  system  should  do, 
but  not  how  it  should  be  done.  The  document  should  include  not  only  the  functional 
transformations  from  inputs  to  outputs,  but  also  timing,  accuracy,  reliability,  and 
security.  Subsystems  and  their  interfaces  must  be  defined. 
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FORMULATION  OF  DETAILED  DESIGN 


The  result  of  this  phase  Is  a  document  describing  how  the  requirements  are  to 
be  satisfied*  The  document  should  include  file  and  program  structures  and  should 
generally  address  the  subsystem  level,  since  interfaces  should  be  well-specified. 

(In  practice,  interfaces  are  usually  specified  by  whatever  program  is  implemented 
first.) 

CRITIQUE 

Subsystems  may  often  be  designed  and  implemented  incrementally,  which  has  the 
important  advantage  that  benefits  and  experience  are  quickly  acquired.  Management 
gains  confidence  in  the  system,  and  problems  can  often  be  recognized  before  they 
affect  much  of  the  system.  However,  a  loosely  integrated  system  is  likely  to  be 
quite  redundant  in  both  programs  and  data  and  will  eventually  become  ineffective  and 
rigid  as  the  interfaces  become  more  and  more  complex. 

Systems  design  for  highly  integrated  systems,  such  as  those  based  on  data  base 
management  systems,  is  ideally  a  much  more  complex  and  iterative  process.  The 
reason  is,  basically,  that  design  now  must  involve  the  close  coordination  of  two 
separate  efforts:  data  design  and  process  design.  The  availability  of  centrally 
controlled,  shared  data  provides  much  more  flexibility,  which  means  more  work  for 
the  designer.  The  phases  described  in  the  following  sections  show  the  feedback  be¬ 
tween  process  design  and  data  design. 
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PHASE  0.  FORMULATE  SYSTEM  OUTLINE 


r* 


This  phase  is  numbered  0,  because  it  often  has  been  completed  before  the  system 
designers  have  become  involved.  In  many  cases,  this  phase  is  treated  as  a  formal  but 
essentially  meaningless  requirement,  which  must  be  completed  before  the  "real"  work 
of  design  can  begin.  The  importance  of  this  phase  was  recognized  in  the  ICP  System 
redesign,  and  the  first  step,  formulation  of  system  concepts  and  guidelines,  was 
completed  satisfactorily  by  a  group  of  very  experienced,  high-level  NAVSUP  managers. 
However,  the  next  two  steps,  defining  the  conceptual  data  structure  and  the  func¬ 
tional  requirements  for  the  system,  were  omitted,  primarily  due  to  lack  of  foresight 
by  the  author,  with  unfortunate  results  in  the  subsequent  phases.  The  steps  that 
should  be  performed  are  described  in  the  following  paragraphs. 

SYSTEM  CONCEPTS  AND  GUIDELINES 

This  step  for  a  highly  integrated  system  is  very  much  like  the  first  phase  for 
a  loosely  integrated  system,  but  it  is  even  more  critical  to  the  design,  development, 
and  acceptance  of  the  system.  A  loosely  integrated  system  nay  be  designed  and  imple¬ 
mented  one  subsystem  at  a  time,  so  that  incremental  benefits  and  experience  may  be 
quickly  acquired,  but  a  highly  integrated  system  requires  much  more  design  work  be¬ 
fore  the  first  subsystem  can  be  implemented,  so  that  both  benefits  and  feedback  are 
considerably  delayed.  The  benefits  of  a  more  flexible  and  effective  system  in  the 
long  run  are  acquired  at  the  cost  of  more  planning  at  the  beginning  of  system  design. 

CONCEPTUAL  DATA  STRUCTURE 

The  output  of  this  step  is  a  document  describing  the  entities  (real-world  ob¬ 
jects  and  concepts),  and  the  relationships  among  them,  that  are  to  be  modelled  in  the 
information  system.  The  conceptual  data  structure  provides  a  common  language  for 
people  in  different  application  areas.  Without  such  a  language,  different  names  can 
be  given  to  the  same  thing  (resulting  in  data  redundancy)  and  the  same  name  can  be 
given  to  different  things  (resulting  in  incorrect  interfaces).  The  conceptual  data 
structure  also  greatly  facilitates  later  design  phases  by  providing  a  framework  to 
which  more  detail  can  be  readily  added. 

An  example  of  a  greatly  simplified  conceptual  data  structure  is  shown  in 
Figure  1.  The  boxes  represent  entities,  and  the  lines  represent  relationships.  A 
line  ending  in  a  triangle  (or  branching  lines  on  other  figures)  indicates  that  many 
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Figure  1  -  Simplified  Example  of  a  Conceptual  Data  Structure 


instances  of  the  entity  at  that  end  of  the  relationship  correspond  to  one  instance 
of  the  entity  at  the  other  end.  For  example,  a  "supply  item"  is  a  real-world 
object  that  can  be  involved  in  many  different  "supply  actions",  but  "supply  action" 
is  a  real-world  concept  that  can  Involve  only  a  single  "supply  item."  Lines  would 
normally  be  labeled  to  identify  the  relationships. 

The  primary  input  to  this  step  is  the  document  on  system  concepts  and  guide¬ 
lines,  supplemented  by  interviews  with  middle  management  to  determine  what  data  are 
required  to  satisfy  high-level  goals. 

FUNCTIONAL  REQUIREMENTS  FOR  THE  SYSTEM 

The  output  of  this  step  is  a  document  describing  the  transformations  necessary 
to  produce  the  required  system  outputs,  given  the  inputs.  The  emphasis  is  on  what, 
rather  than  how  -  no  attempt  is  made  to  specify  algorithms,  or  even  to  decide  whether 
the  transformations  are  to  be  performed  by  people  or  by  computers.  The  intent  is  to 
provide  an  outline  of  processing  to  complement  and  verify  the  conceptual  data  struc¬ 
ture,  to  provide  an  initial  definition  of  the  subsystems,  and  to  provide  an  initial 
definition  of  interfaces.  This  step  will  probably  revise  the  conceptual  data  struc¬ 
ture  to  add,  combine,  and  refine  the  objects,  concepts,  and  relationships. 

The  functional  requirements  of  the  system  are  based  on  system  goals  and  the  con¬ 
ceptual  data  structure,  supplemented  by  interviews  with  middle  management  to  deter¬ 
mine  what  processes  are  required  to  achieve  the  goals.  Much  of  this  step  can  be  done 
in  parallel  with  either  the  preceding  or  following  step.  The  development  of  the 
functional  requirements  for  a  subsystem  is  described  in  the  section  on  analysis  of 
requirements.  The  functional  requirements  for  the  system  are  at  a  much  more  abstract 
level  than  the  functional  requirements  for  the  subsystems  but  are  developed  in  the 
same  way. 
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PHASE  1.  ANALYZE  REQUIREMENTS 


The  output  of  this  phase  is  a  document  which  adds  a  great  deal  of  detail  to  the 
|  functional  requirements  for  the  system,  but  which  is  still  concerned  with  what  is  to 

be  done,  rather  than  how  it  is  to  be  done.  It  is  likely  that  the  preliminary  defini¬ 
tions  of  the  subsystems  and  their  interfaces  may  be  revised.  However,  such  changes 
should  have  minimal  impact  on  the  conceptual  and  logical  data  structures,  which  model 
Inherent  properties  of  real-world  objects,  concepts,  and  relationships  and  are  inde¬ 
pendent  of  the  applications.  The  inputs  for  this  phase  consist  of  the  logical  data 
structure,  the  functional  requirements  of  the  system,  and  functional  details  provided 
by  various  people:  those  who  will  be  the  end  users  of  the  subsystem,  management 
science  and  operations  research  specialists  who  define  the  algorithms,  and  so  on. 

This  phase  was  actually  performed  for  the  ICP  System  redesign  without  the  bene¬ 
fit  of  either  the  conceptual  data  structure  or  the  functional  requirements  for  the 
system;  the  consequences  of  these  omissions  are  described  in  this  section. 

WRITE  REQUIREMENTS  STATEMENTS 

A  requirements  statement  (RS)  was  written  for  each  of  the  132  applications  (sub¬ 
systems).  The  RS's  were  intended  to  provide  both  the  functional  requirements  and  the 
logical  data  structures  for  the  subsystems.  The  structure  of  an  RS  is  shown  in 
Figure  2.  The  RS's  were  to  be  combined,  bottom-up,  to  yield  the  functional  require¬ 
ments  and  logical  data  structure  for  the  system.  This  combination  proved  to  be 
infeasible  for  the  following  reasons: 

1.  RS  writers  received  minimal  training,  so  that  quality  was  often  low.  In 
particular,  many  did  little  more  than  write  an  operational  specification  for  the 
existing  system,  making  it  difficult  to  determine  what  the  requirements  really  were. 

2.  RS's  were  often  very  detailed  and  complex,  were  written  in  English  with 
all  its  ambiguities,  and  followed  a  rather  loose  structure,  so  that  internal  com¬ 
pleteness  and  consistency  could  generally  not  be  checked. 

3.  No  conceptual  or  logical  data  structure  was  available  to  the  RS  writers, 
so  they  devised  their  own  data  collections  and  even  many  data  elements.  Conse¬ 
quently,  there  was  no  reasonable  way  of  combining  the  RS's  to  form  a  functional 
description  of  the  whole  system. 
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Figure  2  -  Example  of  RS  Structure 
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DRAW  FUNCTIONAL  DIAGRAMS 

The  first  step  toward  making  the  RS's  useful  was  to  develop,  for  each  RS,  a 

hierarchy  of  functional  diagrams  (Figures  3  and  4  are  examples  of  tirst  and  second 

level  diagrams,  respectively).  These  diagrams  were  intended  to  represent  only  the 

functional  requirements,  without  the  detail  of  the  RS's.  The  diagramming  technique, 

3 

a  simplified  version  of  SADT  (TM)  ,  provided  not  only  a  clearer  description  of  the 
functions,  but  also  a  means  of  checking  for  internal  consistency  and  completeness. 

The  quality  of  the  diagrams  was  much  higher  than  that  of  the  RS's,  in  part  because  of 
the  technique  and  in  part  because  the  diagrams  were  constructed  by  a  much  smaller, 
more  highly  trained  group  of  people.  However,  the  lack  of  consistency  in  data  col¬ 
lections  still  made  it  impossible  to  combine  the  diagrams  for  different  RS's. 
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Figure  3  -  Example  of  an  Overview  Diagram 

MECHANIZE  REQUIREMENTS  STATEMENTS 

The  second  step  toward  making  the  RS's  useful  was  to  represent  them  in  a 

computer-processable  form.  The  functional  diagrams  and  data  collections  (see  example 

1  2 

in  Figure  5)  were  expressed  in  the  Problem  Statement  Language  (PSL),  ’  stored  in  a 
Problem  Statement  Analyzer  (PSA)  data  base,  analyzed  for  consistency  and  completeness 
by  PSA,  and  documented  by  PSA  for  review  by  NAVSUP  analysts.  Mechanization  of  the 
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RS's  provided  a  way  of  combining  the  simplicity  of  the  functional  diagrams  with  the 
element-level  detail  of  the  RS's.  Mechanization  also  provided  a  reasonable  way  of 
maintaining  the  RS's.  Finally,  mechanization  provided  a  small  but  useful  step  toward 
understanding  the  relationships  among  different  RS's:  the  use  of  standardized  Data 
Element  Numbers  (DEN's)  could  be  readily  compared  across  RS's.  Since  RS  writers 
could  also  define  their  own  non-standard  data  elements  (those  beginning  with  "X"  in 
Figure  4),  which  generally  could  not  be  compared  across  RS's,  RS's  could  be  compared 
only  to  a  limited  degree. 

Examples  of  the  PSA  reports  used  to  check  the  validity  of  the  PSA  data  base  are 
shown  in  Figures  6-8.  Figure  6  indicates  that  the  ELEMENTS  have  not  yet  been  given 
DESCRIPTIONS  (English  text  carried  with  the  data  element  but  not  analyzed  by  PSA). 
Figure  7  is  a  list  of  names;  this  can  be  visually  checked  for  various  types  of  con¬ 
sistency,  such  as  spelling  and  adherence  to  naming  conventions.  Lines  51  and  52 
indicate  a  problem,  since  every  element  should  have  a  descriptive  title  as  well  as  a 
number.  Figure  8  indicates  either  that  the  INPUTS  and  OUTPUT  have  been  incompletely 
defined  (the  sources  of  the  INPUTS  and  destination  of  the  OUTPUT  are  missing),  or 
that  these  names  are  misspellings  of  some  other  data. 

Figures  9-12  are  examples  of  reports  sent  to  the  NAVSUP  analysts.  Figure  9 
shows  an  overview  of  an  RS  -  its  INPUTS,  OUTPUT,  and  the  more  detailed  PROCESSes  into 
which  it  is  decomposed.  The  overview  is  useful  primarily  as  a  brief  orientation  to 
the  RS. 

Figure  10  describes  a  data  collection  (in  this  case  a  GROUP  -  which  may  be  a 
record).  Figure  11  shows  briefly  how  the  RS  processing  is  hierarchically  structured, 
and  Figure  12  describes  the  interaction  of  each  process  with  data. 

CRITIQUE 

The  problems  encountered  in  this  phase  could  be  avoided  by  the  following 
procedure: 

1)  Develop  the  system  outline  as  proposed  in  Phase  0.  Train  analysts  to  use 
that  outline. 

2)  Develop  functional  diagrams  to  describe  the  functional  requirements  of  the 
subsystems. 

3)  Store  and  analyze  the  functional  requirements  with  a  tool  such  as  PSL/PSA. 
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Figure  6  -  Example  of  an  RS  Summary 


4)  Add  additional  detail  to  the  requirements  data  base. 

5)  At  all  t'  es,  provide  adequate  training  to  analysts  in  how  their  require¬ 
ments  will  be  used,  and  limit  the  number  of  people  working  simultaneously 
so  that  their  efforts  can  be  closely  coordinated. 

The  objectives  of  this  procedure  are  to  provide  a  detailed  framework  within 
which  to  perform  each  step  of  the  design  phase,  to  provide  consistency,  and  to  detect 
errors  before  detailed  work  has  been  done.  The  design  steps  should  be  accomplished, 
or  at  least  closely  supervised,  by  a  small  group  of  highly  trained  people. 


19 


PS*  V «r i i on  A5.1R5 


RS-A812 


Jun  ZZ.  196 


Na  *■  List 


1  A002-UNIT-1 DENTIFICATION-COOE  ELEMENT 

Z  COOl-FEO-STK-NR-ACT VY-CNTRL-NRELEMENT 
J  C003B-SPCL-MATL-ICFCN-C0DE  ELEMENT 
A  cOO«R-MEC-CMPNT-£?PMT-F8N  ELEMENT 

5  C0080-HIL-E  SENTLTT-CQOE-CMPNT-ELEM  ENT 

6  tO u-prvsng-odc-cmtrl-nr  ELEMENT 
/  tOlT-SECURI  iy-classification-celement 

e  C543B-TYPE-CQNMANQER-NANE  ELEMENT 

9  OO06-REPARABLE-I0ENTIF I C AT  I ON-E LEM  ENT 

10  00080-EQUI PMENT-I0ENTIFICAT1GNELENENT 

11  UO09-APPHCATION-COOE  ELEMENT 

12  O013N-NAINTENANCE-LEVEL-CAPABIELEMENT 

13  o029-APLICN”IDFCN-NR-ACTVY-CG0ELEN£NT 
1*  UQ31-LOGISTIC-SUPBORT“STATUS-C ELEMENT 

15  0032-SERIAL -NUMBER  ELEMENT 

lb  uO34-TY-OF-TECH-CCC-CO0E  ELEMENT 

1 J  U036S-SHIP-TYPE-A AD-HULL -NUM8EE LEM  ENT 

16  J0360-EN0-USE-NAME  ELEMENT 

19  uOIF-OATA-CRIGR-VALQN-COOE  ELEMENT 

20  u0I8-E9UIPMENT-SU°PL IERS-COOE  ELEMENT 

21  uO44-TECHNICAL-C0GNIZANCE-C0CE ELEMENT 

22  U0F6-INSTALL-PLAN-NUM8ER  ELEMENT 

23  O0ZZ-TYPE-0F-CHAN5E-RE0UEST  ELEMENT 

2*  00Z8-INSX.  ALL -PLAN-REVISION  ELEMENT 
25  jOZ9-INSTALL-PLAN-PIECE-NUNHER ELEMENT 
2  6  0080 -INSTALL  AT ION-PLAN-QUANI HELEN  ENT 

2T  O081-TYPE-MATL-RE5MTS-00C-CUCEELEMENT 
2  8  0082 -MATER  I AL-ROMTS -DOC -ITEM -NE LEM  ENT 

29  uQ03-EOUIP-COMP-MaO-IOENT-NR  ELEMENT 

30  o084-TCCH-MNUAL-*UN8ER  ELEMENT 

31  O085-RE QUEST-NUMBER  ELEMENT 

32  u086-MATERIAL- RE QUIRE-OOCU ME  NT ELEMENT 


33  tOOl-APL-AEL-NQMEA  ELEMENT 

34  e.01Q-SRVC-APLICN”C5CRN  ELEMENT 

35  EOIOA-SERVI CE-APLICN-COOE  ELEMENT 

36  c.0  33 -ACT  I  OK -CODE  ELEMENT 

3Z  t052-PEI-LCCN-C00E  ELEMENT 

38  t093-VLV-MK-ELEC-SYM-NR  ELEMENT 

39  tl26-EQUIPMENT-NA»Y-SUPP0RT-CAELEMENT 

40  tl2Z-N0RK-CTR-RSP3L -FOR "COM PARE  LEM  ENT 

41  t 1 28 -MORK-C TR-RSPaL -F OR -E8U I PME LEM  ENT 

42  t 1 29 -MO RK- e RE AKOOVN- STRUCTURE-ELEMENT 

43  L130-MAINTENANCE-IN0EX-PAGE  ELEMENT 

44  tllZ-ALLOMANCE-IN'HCATOR  ELEMENT 

45  tl 33-ACCESS -NUM8ER  ELEMENT 


ELEMENT 

ELEMENT 


46  tl 54-TRANSACTION-CRIGNR-COOE  ELEMENT 
42  tl 35 -SPECIAL -REPOBT-lNOCR  ELEMENT 

48  tl36-CPSS-I TEM-INEICATOR  ELEMENT 

49  tlSZ-APPROV-AUTHRY-COOE  ELEMENT 

50  cl  36 -CHANGE -RE  QUEST -CATE  GORY -CE LEM  ENT 

51  tl 39  ELEMENT 

52  E 1 40  ELEMENT 

33  tl41-PRCMT-S0URCE*0QC-ITEH-QUE ELEMENT 


Figure  7  -  Example  of  a  Consistency  Check 
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Figure  8  -  Example  of  a  Completeness  Check 
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Picture  Report 


Figure  9  -  Example  of  an  RS  Overview 
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Figure  10  -  Example  of  an  RS  Data  Report 
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Figure  11  -  Example  of  RS  Process  Structure 
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Figure  12  -  Example  of  RS  Process  Detail 
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PHASE  2.  DESIGN  GLOBAL  LOGICAL  DATA  BASE 
The  output  of  this  phase  is  to  be  a  data  base  design  which  is  global  (i.e.,  it 
supports  the  data  base  requirements  of  all  the  subsystems)  and  logical  (i.e.,  it  is 
independent  of  any  particular  hardware  or  software  environment).  Ideally,  this  phase 
should  be  accomplished  by  developing  a  logical  data  base  design  for  each  subsystem, 
taking  into  consideration  the  functional  requirements  for  the  subsystem  and  the  con¬ 
ceptual  data  structure,  and  then  combining  the  subsystem  designs  to  form  a  logical 
data  base  for  the  system.  This  procedure  did  not  seem  feasible  in  the  ICP  System 
redesign,  because  there  was  no  conceptual  data  base  to  provide  common  names  and  no 
high-level  structure  to  assure  a  reasonable  degree  of  compatibility  among  the  sub¬ 
system  designs.  The  preferred  approach  and  then  the  actual  approach  are  discussed 
next . 

DESIGN  LOGICAL  DATA  STRUCTURE  FOR  SUBSYSTEMS 

The  output  of  this  step  should  be  a  data  structure  which  is  independent  of  hard¬ 
ware  and  software  and  which  supports  a  given  subsystem.  The  data  structure  appears 
to  the  subsystem  to  represent  a  collection  of  application-oriented  files  -  data  of 
interest  only  to  other  subsystems  is  suppressed  wherever  possible.  Note,  however, 
that  such  data  cannot  always  be  suppressed:  for  example,  a  subsystem  which  prepares 
an  invoice  may  have  to  add  a  control  field  which  is  of  interest  only  to  another 
subsystem.  In  general,  it  is  possible  to  design  top-down  from  the  system  to  the  sub¬ 
systems,  but  not  bottom-up  by  combining  the  logical  data  structures  for  the  subsys¬ 
tems.  The  conceptual  data  structure  is  an  indispensable  input  to  this  step  to 
ensure  the  compatibility  of  the  subsystems.  The  other  input,  the  functional  require¬ 
ments  for  the  subsystem,  determines  most,  but  not  all,  of  the  logical  data  structure 
for  the  subsystem.  This  step  will  probably  be  done  in  parallel  with  the  development 
of  the  functional  requirements  for  the  subsystem  and  necessitate  minor  changes  to 
the  conceptual  data  structure. 

DESIGN  LOGICAL  DATA  STRUCTURE  FOR  THE  SYSTEM 

The  output  of  this  step  should  be  a  data  structure  which  is  independent  of  hard¬ 
ware  and  software,  is  more  detailed  than  the  conceptual  data  structure,  but  is  limi¬ 
ted  in  scope  to  data  which  is  shared  by  different  computer  subsystems.  The  data  may 
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be  in  a  file  (where  they  are  controlled  by  one  subsystem  and  accessed  by  a  few 
subsystems),  or  in  a  data  base  (where  they  are  controlled  by  a  data  base  management 
system  and  are  accessible  by  many  subsystems).  The  logical  data  structure  must,  of 
course,  provide  all  the  data  required  by  the  computer  subsystems.  Feedback  to  both 
preceding  phases  is  quite  likely.  The  inputs  to  this  phase  are  the  conceptual  data 
structure  and  the  functional  requirements  for  the  system.  The  logical  data  structure 
may  differ  from  the  conceptual  data  structure  in  its  period  of  applicability:  the 
conceptual  data  structure  should  be  valid  as  long  as  the  organization  remains  in 
the  same  business,  whereas  the  logical  data  structure  will  frequently  change  as 
different  applications  are  performed  by  computer.  The  logical  data  structure  for 
the  system  must  be  able,  at  any  time,  to  satisfy  all  computerized  data  requirements 
at  that  time. 

ACTUAL  DESIGN  OF  THE  GLOBAL  LOGICAL  DATA  BASE 

The  actual  steps  taken  were  rather  different  from  the  ideal.  First,  a  global 
logical  data  base  (GLDB)  structure  was  developed  using  a  methodology  that  was  both 
top-down  (from  real-world  objects,  concepts,  and  relationships)  and  bottom-up  (from 
the  DENs  and  non-standard  data  elements).  The  objective  was  to  produce  both  the  con¬ 
ceptual  data  structure  and  the  logical  data  structure.  The  initial  version  of  the 
GLDB  proved  to  be  extremely  complex  and  much  more  expensive  than  expected.  These 
problems  seem  to  have  been  caused  by  the  large  number  of  analysts  involved  (as  many 
as  forty  at  the  same  time,  on  different  RS's),  by  the  lack  of  a  conceptual  data 
structure  to  serve  as  a  guide,  by  the  introduction  of  a  large  amount  of  detail  (the 
data  elements)  at  an  early  stage,  and  by  the  difficulty  of  resolving  differences 
among  the  analysts.  An  extensive  revision  of  the  GLDB  has  been  completed,  with  a 
great  deal  of  improvement  in  both  the  simplicity  and  quality  of  the  result.  Part  of 
the  improvement  is  undoubtedly  due  to  the  availability  of  the  first  GLDB  to  serve  as 
a  guide,  but  a  large  part  is  due  to  the  use  of  a  much  smaller  group  of  people,  work¬ 
ing  strictly  top-down.  This  improved  methodology  consists  of  three  steps: 

1)  Determine  Entities  and  Relationships, 

2)  Define  Boxes  and  Lines,  and 

3)  Review  and  Revise  GLDB  Structure. 

The  first  two  steps  are  performed  for  each  RS,  and  the  third  is  performed  as  needed 
on  the  GLDB,  which  is  a  composite  of  all  the  RS's. 
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DETERMINE  ENTITIES  AND  RELATIONSHIPS 

This  step  determines  the  entities  (real-world  objects  and  concepts)  and  rela¬ 
tionships  with  which  a  particular  RS  is  concerned.  The  primary  sources  of  guidance 
in  determining  the  entities  and  relationships  are  RS  Section  2,  the  System  Summary; 
Section  3.2,  the  System  Functions;  Section  6.D,  the  Functional  Flow  Charts  (see 
Figure  2  -  Example  of  RS  Structure);  the  RS  Overview  (Figure  9);  the  RS  Data  Report 
(Figure  10);  the  RS  Process  Structure  (Figure  11);  and  the  RS  Process  Detail  (Figure 
12).  An  entity  may  be  a  real-world  object,  such  as  a  "supplier",  or  a  real-world 
concept,  such  as  an  "Invoice",  and  must  have  a  meaningful  name  and  an  identifier 
(primary  key)  by  which  instances  can  be  distinguished  from  one  another.  Reports  are 
generally  not  entities,  but  the  subjects  of  the  reports  are  entities.  Figure  13 
shows  the  entities  determined  for  one  RS.  Entities  with  the  same  identifiers  are 
then  combined  into  a  single  entity.  Entity  names  are  then  changed  to  correspond  to 
the  names  in  the  conceptual  data  structure  (Figure  1).  Relationships  may  be  sugges¬ 
ted  by  the  processing  described  in  the  RS,  but  they  are  based  on  inherent  properties 
of  the  data.  Figure  14  is  a  greatly  simplified  diagram  of  the  entities  and  relation¬ 
ships  relevant  to  a  particular  RS.  Note  the  combination  of  "issue",  "receipt",  etc. 
into  "supply  action."  "Issue",  "receipt",  etc.  may  later  become  data  subclasses  - 
i.e.,  specific  types  of  "supply  action"  -  if  they  contain  unique  data  elements. 

act ivity 
supply  item 
4  ssue 
receipt 
adjustment 
freeze /unfreeze 
establish /select 
restow 

physical  inventory  request 

(This  list  is  for  the  exclusive  use  of  the  analyst  working  on  this  RS,  and  would 
normally  be  handwritten.) 


Figure  13  -  Example  of  an  Initial  List  of  Entities 


Figure  14  -  Example  of  a  Diagram  of  Entities  and  Relationships 

(This  diagram  is  for  the  exclusive  use  of  the  analyst  working  on  this  RS,  and  would 
normally  be  handwritten.  The  "P"  indicates  the  primary  key,  or  identifier.) 

DEFINE  BOXES  AND  LINES 

The  next  step  is  to  define  data  clusters  (represented  on  a  data  structure  dia¬ 
gram  by  boxes)  and  relationships  (represented  by  lines).  There  are  five  types  of 
boxes,  distinguished  by  whether  the  identifier  is  determined  entirely  from  data  with¬ 
in  the  data  cluster  (an  entity  that  can  exist  independently  of  other  entities),  or 
determined  in  whole  or  in  part  by  data  in  another  box  (in  which  case  its  existence 
depends  on  the  existence  of  the  other  box): 

1)  A  data  class  has  an  identifier  (indicated  by  the  "P")  within  the  box: 


dc  -  activity 
P:  A002 

■ 


The  box  represents  an  independent  entity. 
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2)  A  data  subclass  has  a  1:1  relationship  with  a  superior  box,  which  provides 
the  identifier: 

dc-activity 

P.A002 

dsc-acty-accountable 


The  inferior  box  represents  a  special  case  of  the  superior  box. 

3)  A  repeating  subclass  has  an  N:1  relationship  with  a  superior  box,  which  pro¬ 
vides  part  of  its  identifier: 


The  box  represents  many  instances  and  depends  on  the  superior  box. 

4)  A  secondary  key  has  a  1:N  relationship  with  another  box  and  identifies  a 
subset  of  it: 


sck-si-pack-pres 

dc-supply-item 

P.C021 

P:D046D 

The  left-hand  box  represents  a  dependent  entity  (packaging  and  preservation  rules) 
which  is  accessed  only  through  the  other  entity. 
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5)  A  shared  subclass  has  two  or  more  N:1  relationships  with  other  boxes  and  is 
partially  or  completely  identified  by  them: 


The  box  represents  an  N:N  relationship  with  possible  intersection  data  (in  this  case, 
the  quantity  of  an  item  at  an  activity).  Its  existence  depends  on  the  existence  of 
the  superior  boxes. 

There  are  two  types  of  lines,  or  relationships,  other  than  those  already  intro¬ 
duced  : 

1)  A  cross-reference  key  represents  a  1:1  or  1:N  relationship  between  two 
boxes. 


dc-supply-item 

x1n-d046d-si-has-sa  ^ 

dc-supply-action 

P:  D046D 

S 

P:A002 

The  connectivity  and  the  symbolic  key  (a  DEN)  are  indicated  in  the  name  of  the 
relationship. 

2)  A  recursive  structure  represents  a  1:1,  1:N,  or  N:N  relationship  of  a  box 
to  itself: 


_ ZT) 

|  dc-component-or-equipment  J 
P:D008  ^<n 


rnn-comp-comprises-comp 
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Note  In  all  of  these  examples  that  naming  conventions  are  very  important.  The 
effective  use  of  PSL/PSA  depends  to  a  very  great  extent  on  the  development  and  en¬ 
forcement  of  strict  naming  conventions. 

The  graphic  notation  and  PSL/PSA  representation  is  based  on  two  objectives:  to 
capture  all  necessary  information  about  the  data,  and  to  present  that  information  in 
as  simple  a  form  as  possible.  The  simplest  form  is  a  hierarchy  of  a  data  class,  data 
subclasses,  repeating  subclasses,  and  secondary  keys.  Shared  subclasses  really  be¬ 
long  to  two  or  more  hierarchies  but  are  drawn  under  one  hierarchy  and  represented  in 
PSL  as  a  subpart  of  one  hierarchy.  The  result  is  a  collection  of  interrelated  hier¬ 
archies,  each  of  which  can  be  examined  more  or  less  independently.  An  example  of  one 
such  structure  is  shown  in  Figure  15.  A  RELATION  whose  name  begins  with  "rel"  indi¬ 
cates  a  relationship  with  a  shared  subclass  which  has  been  placed  in  another  hier¬ 
archy.  For  example,  at  line  12,  "rel-si-stock-at-acty-acctbl-sp"  indicates  that 
there  is  a  shared  subclass  called  "ssc-si-stock-at-acty-acctbl-sp"  in  the  "dc-supply- 
item"  hierarchy.  Figure  16  shows  more  detail  about  a  particular  entity,  and  Figure 
17  shows  more  detail  about  a  particular  relationship. 

This  step  has  two  results:  a  data  structure  diagram  for  a  particular  RS,  and 
possible  additions  or  modifications  to  the  structure  of  the  GLDB.  A  manually  pre¬ 
pared,  greatly  simplified  GLDB  diagram  is  shown  in  Figure  18.  Clearly  manually 
prepared  diagrams  are  almost  impossible  to  maintain.  The  next  step  is  intended  to 
maintain  the  integrity  of  the  GLDB  structure  as  it  expands  in  scope  and  detail. 

REVIEW  AND  REVISE  GLDB  STRUCTURE 

This  review  should  be  conducted  frequently  enough  to  detect  and  resolve  problems 
before  they  have  many  side  effects,  yet  not  so  frequently  that  only  trivial  problems 
are  detected.  Each  data  class  is  reviewed  according  to  the  following  procedures: 

1)  Review  the  data  class  itself.  Ensure  that  it  has  a  meaningful  name  and 
description.  Combine  it  with  any  previously  reviewed  data  class  that  represents  the 
same  entity.  Subdivide  it  if  it  represents  two  or  more  fundamentally  different  kinds 
of  entities. 

2)  Review  the  hierarchy  of  data  subclass,  repeating  subclasses,  secondary  keys, 
and  recursive  structures.  Ensure  that  the  identifier  of  each  box  is  appropriate  to 
its  type. 
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Figure  15  -  Example  of  GLDB  Structure 
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Figure  16  -  Example  of  Entity  Detail 
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Figure  17  -  Example  of  Relationship  Detail 


3)  Review  the  shared  subclasses  and  cross-reference  keys  in  the  hierarchy. 
Ensure  that  the  identifiers  of  shared  subclasses  are  appropriate.  Ensure  that  the 
relationships  are  meaningful. 

4)  Resolve  any  problems  that  could  not  be  resolved  within  an  individual  hier¬ 
archy. 

5)  Eliminate  redundant  relationships. 

Final  documentation  may  be  developed  when  all  RS's  have  been  completed.  The  use 
of  PSL/PSA  to  represent  the  evolving  structure  should  eliminate  the  need  for  any  man¬ 
ually  prepared  documentation. 

CRITIQUE 

The  problems  encountered  in  this  phase  could  have  been  avoided  by: 

1)  Using  a  smaller,  better  trained  group  of  analysts. 

2)  Having  guidance  available  in  the  form  of  a  good  conceptual  data  structure. 

3)  Adhering  to  a  strict  top-down  design  methodology,  with  data  elements 
assigned  only  after  the  high-level  GLDB  structure  had  been  completed  for  all  RS's. 
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Figure  18  -  F.xample  of  a  r.I.DB  Diagran 
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PHASE  3.  DEFINE  DATA  BASE  PROCESSES 

The  output  of  this  phase  is  to  be  both  a  validation  of  the  OLDB  structure  and 
the  development  of  the  workload  on  it.  The  following  seven  steps  are  performed  for 
each  RS: 

1)  Map  the  RS  Data  into  the  GLDB  Structure 

2)  Split  or  Combine  RS  Processes 

3)  Assign  Data  to  File  or  Data  Base 

4)  Review  RS  Processes 

5)  Diagram  Data  Base  Processes 

6)  Represent  Data  Base  Processes  in  PSL/PSA 

7)  Review  and  Revise  GLDB  Structure. 

A  small,  well-trained  group  of  NAVSUP  analysts  performed  the  first  five  steps 
for  six  sample  RS's.  Although  the  steps  are  tedious,  the  results  seem  to  be  quite 
satisfactory.  The  sixth  step  was  performed  on  three  of  the  six  RS's.  As  expected, 
the  GLDB  structure  did  require  revision. 

MAP  THE  RS  DATA  INTC  THE  GLDB  STRUCTURE 

This  step  requires  the  analysis  of  each  INPUT,  OUTPUT,  or  GROUP  in  the  RS  to 
determine  the  entities  and  relationships  in  the  GLDB  required  to  represent  it.  Hie 
analysis  is  initially  done  top-down  -  i.e.,  the  INPUT,  OUTPUT,  or  GROUP  is  repre¬ 
sented  first  in  terms  of  data  class  hierarchies,  and  then  in  the  specific  entities 
and  relationships  within  the  hierarchy.  The  result  is  verified  by  checking  to 
ensure  that  each  data  element  in  the  INPUT,  OUTPUT,  or  GROUP  is  represented  some¬ 
where  in  the  entities  and  relationships. 

SPLIT  OR  COMBINE  RS  PROCESSES 

First,  this  step  requires  the  analysis  of  each  process  within  the  RS  to  deter¬ 
mine  what  entities  and  relationships  are  used,  derived,  or  updated  by  it.  Second, 
each  process  is  split  into  simpler  processes  if  it  involves  a  delay  in  data  base 
interaction  (e.g«,  if  it  waits  for  verification  by  a  clerk),  or  if  it  uses,  derives, 
or  updates  different  sets  of  entities  and  relationships  at  different  times  (e.g.  ,  ’r 
it  uses  different  data  at  the  end  of  the  week  and  the  end  of  the  month).  The  process 
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is  split  because  it  represents  two  (or  more)  different  processes  as  far  as  the  data 
interactions  are  concerned*  Third,  successive  processes  are  combined  if  they  occur 
at  the  same  frequency,  if  there  are  no  delays,  and  if  they  interact  with  the  same 
entities  and  relationships.  The  processes  are  combined  in  order  to  more  accurately 
represent  the  workload  -  instead  of  two  sequences  of  data  base  interactions,  they 
really  represent  a  single  sequence. 

ASSIGN  DATA  TO  FILE  OR  DATA  BASE 

This  step  involves  determining  whether  an  entity  or  relationship  should  be 
stored  in  a  private  file  or  in  a  data  base.  The  data  should  be  stored  in  a  private 
file  if  any  of  the  following  are  true: 

1)  The  data  are  of  interest  to  only  a  single  process  and  therefore  need  not 
be  shared  in  a  data  base. 

2)  The  data  are  transitory  and  would  not  exist  long  enough  to  be  relevant  to 
other  processes. 

3)  The  data  are  incomplete,  as  in  a  partially  completed  update,  and  therefore 
could  not  be  used  by  other  processes. 

4)  The  data  consist  entirely  of  references,  or  keys,  to  other  data,  are  of 
interest  to  only  one  process,  and  are  therefore  irrelevant  to  other  processes. 

The  data  should  be  stored  in  the  data  base  if  all  of  the  following  are  true: 

1)  The  data  are  of  interest  to  many  processes  and  should  therefore  be  shared. 

2)  The  data  are  sufficiently  long-lived  to  have  many  uses. 

3)  The  data  are  complete. 

4)  The  data  are  descriptive  of  the  real  world. 

REVIEW  RS  PROCESSES 

The  next  step  is  to  eliminate  from  consideration  all  data  that  have  been 
assigned  to  private  files  and  all  processes  that  do  not  interact  with  the  data  base. 
The  remaining  processes  should  then  be  reviewed  for  possible  additional  combination: 

1)  Successive  processes  are  combined  as  in  the  second  step  if  they  really  re¬ 
present  two  parts  of  a  sequence  of  data  base  Interactions. 

2)  Processes  with  identical  data  base  interactions  are  combined,  with  suitable 
adjustment  of  frequency  of  occurrence,  since  they  are  indistinguishable  as  far  as  the 
workload  is  concerned. 
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DIAGRAM  DATA  BASE  PROCESSES 

This  step  involves  the  construction  of  a  diagram  showing  the  path  through  the 
data  base  traversed  by  each  data  base  process.  Figure  19  shows  a  part  of  such  a 
diagram.  The  rectangles  represent  entities,  the  arrows  repre >ent  relationships, 
the  circled  numbers  indicate  the  order  of  processing,  the  DEN  on  an  arrow  indicates 
a  retrieval  key,  the  number  near  an  arrow  head  represents  the  number  of  instances 
retrieved,  and  the  number  inside  the  rectangle  represents  the  number  of  instances 
actually  used  (some  of  the  instances  may  have  been  retrieved  only  to  check  for  some 
value).  A  DEN  inside  a  rectangle  represents  the  order  in  which  the  retrieved 
instances  are  to  be  used,  and  an  "A"  indicates  that  an  entity  is  needed  only 
to  access  another  entity  and  that  it  has  no  data  needed  by  the  process  (not  shown 
in  the  example).  A  ”U"  indicates  that  an  entity  or  relationship  is  to  be  updated. 

If  it  is  impossible  to  construct  a  diagram  because  either  a  required  relation¬ 
ship  or  a  required  entity  is  not  in  the  GLDB  structure,  the  GLDB  structure  obviously 
must  be  revised. 

REPRESENT  DATA  BASE  WORKLOAD  IN  PSL/PSA 

Figure  20  shows  an  example  of  part  of  the  structure  of  processes  needed  to 
represent  the  data  base  workload.  The  highest  level  subsumes  the  workload  at  the 
Ships  Parts  Control  Center  (SPCC)  in  Mechanicsburg ,  Pennsylvania;  a  similar 
structure  exists  for  the  Aviation  Supply  Office  (ASO)  in  Philadelphia.  The  second 
level  represents  the  different  applications.  The  third  level  consists  of  the  data 
base  processes.  The  fourth  level  describes  each  step  in  the  path  through  the  data 
base . 

Figure  21  shows  a  generalized  example  of  a  diagram  of  a  data  base  process.  Note 
that  "entity-el"  (possibly  a  data  class)  has  subparts,  "entity-e2",  "entity-e3",  etc. 
(data  subclasses,  repeating  subclasses,  or  secondary  keys).  Figure  22  shows  the 
representation  in  PSl.  of  th~  diagram.  Lines  2-6  indicate  the  number  of  instances 
examined  and  the  number  accepted.  Line  7  represents  the  "A"  above  the  box.  Lines 
9-10  and  15-17  indicate  volatility  data  not  on  the  diagram.  Lines  11  and  13  show 
the  data  actually  used  (there  could  be  many  data  subclasses,  repeating  subclasses, 
or  secondary  keys),  and  lines  12  and  14  show  the  DENs  used  as  the  sort  criteria 
(there  could  be  many  DENs).  Line  18  identifies  the  line  into  the  box,  and  line  11 
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Base  Process 


40 


PS*  Ver  sion  *5. 1R50 


* 


*or  19,  1982  13:47:38 

Oat »*CI asses 

Structure  Resort 


1 

spec- work  load 

P  8  OC  ES  a 

2 

socc-«21 5-dt»o 

PR  OCES  a 

5 

$pcc-»215-l 

PROCESa 

* 

socc-a21 5-1 -1 

PROCESS 

5 

socc*a21 5-1-2 

PROCESS 

6 

socc-i2i 5-1 -’ 

PROCESS 

7 

socc“*21 5-1 -4 

PROCESS 

0 

spcc-»21 3  -1  -5 

PROCESS 

9 

soec-i21  5-1 -5 

PR  OCES  a 

10 

spcc-»21 5 -1  -7 

PROCESS 

1  1 

socc- »21 5  - 1  -* 

PROCESS 

1  2 

sdcc  —  a21  5-1-9 

PROCESa 

1  5 

soce-*21  5-1-10 

P  8  (ICES  a 

1  4 

socc-i21  5-1-11 

PR  OC  ES  S 

15 

sp-c-a21  5-1-12 

PR  OCES  a 

1  6 

socc-  *21  5-1-15 

PR  OCES  a 

1  7 

spcc-*21 5-1 -14 

PR  OCES  a 

18 

socc-*21  5-1-15 

PR OCES a 

1  9 

spec- a21  5 -l  -16 

PR  OCES  a 

20 

s  oc  c  -  *  2 1  5-2 

PR  OCES  a 

21 

socc-a2l 5 -2 “1 

PR  OCES  a 

22 

soec-a21  5  -? 

PR  OCES  a 

2  5 

socc-*21  5  -2  -T 

PROCESa 

24 

spee-*21  5 -2 -4 

PR  OCES  a 

25 

soec-*21  5-2-5 

PROCESa 

26 

spec- *21 5-2-6 

PROCESa 

2  2 

socc-a21 5-2-7 

PR  OCES  a 

20 

sdcc-  *21  5  -2  -8 

PROCESa 

29 

socc- *215-2 -9 

PROCESa 

50 

socc*  *  2 1  5-2-10 

PROCESa 

51 

spcc-*21 5-2-11 

PROCESa 

52 

socc-*21  5-2-12 

PROCESa 

5  5 

sd  cc -a221 -dc  p 

PROOFS  a 

54 

s  oc  c-  *22 1  - 1 

PROCESa 

55 

socc“*22l -1 -l 

PROCESa 

56 

sdcc-  *22  1-1-2 

PROCESa 

57 

soc  c-  *  22  1-1-5 

PROCESa 

50 

soec-*221 -1 -4 

PR  nets  a 

59 

soc c-*22 1  -1  -5 

PS  XfSa 

40 

socc-*221-l -6 

PROCESa 

4  1 

spcc-*22 1-1-7 

PR  OCES  a 

42 

s  oc  c- *22  1  -2 

PROCESa 

4  5 

socc-*2? 1 -2 -1 

PROCESa 

4  4 

sdcc-*221 -2 -’ 

P  R  OCES  a 

45 

spcc-*221  -2 -5 

PR  OC  ES  a 

4  6 

socc- a22 1-2-4 

PROCESa 

4  7 

socc“*22 1 -2 -5 

PROCESa 

4e 

socc-4221 -2 -6 

PS OC  ES  a 

49 

socc-a2?l  -2  —7 

PR  OCES  a 

>0 

socc-a221  -2  -° 

p  ROCrS  a 

5  1 

socc-*221  -2 -9 

PR  OC  ES  a 

52 

socc-a  22  1 -2  -1 0 

PROCESa 

5  5 

socc- *2 2 i -2  -t l 

PR  OC  f  S  a 

5  4 

socc-*22 1 -2 -l 2 

P  R  OC  r  5  a 

55 

sdcc-*?7! -2  -1 3 

PR  ICES  a 

Figure  20  -  Example  of  Workload  Structure 
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set-of-entity-el 


rsl  -proc2  stepl 


rsl  proc2-step2 


rsl  proc2  step3 


Figure  21  -  Generalized  Example  of  a  Diagram  of  a  Data  Base  Process 

the  basic  entity.  Lines  19-21  represent  the  frequency  with  which  the  step  occurs, 
and  lines  22-23  indicate  its  successor  and  predecessor. 

Figure  23  shows  the  PSL/PSA  representation  of  part  of  Figure  19.  Boxes  1,  8, 

9,  10,  11,  12,  13,  14  and  others  not  shown  in  Figure  19  are  DERIVED  (accessed) 
simultaneously  because  they  are  parts  of  the  same  basic  entity.  Figure  24  shows  the 
PSL/PSA  representation  of  Box  2  of  Figure  19.  Note  that  "rll-si-substitutes"  on 
Figure  19  has  been  expanded  into  a  PSL  RELATION  ("rej-si-substitutes")  to  a  dummy 
record  ("dum-si-substitutes")  and  a  RELATION  ("rem-si-substitutes")  from  the  dummy 
record.  The  inverse  RELATIONS,  which  are  not  needed  in  the  example,  are  "rek-si- 
substitutes"  and  "rel-si-substitutes. "  All  recursive  relationships  and  shared 
subclasses  must  be  similarly  expanded  to  explicitly  represent  all  the  implied 
relationships. 
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Figure  22  -  Generalized  Example  of  Workload  Detail 

REVIEW  AND  REVISE  GLDB  STRUCTURE 

The  final  step  is  to  determine  whether  any  entities  are  frequently  accessed  only 
to  access  other  entities  and  relationships.  If  so,  such  accesses  may  be  eliminated 
by  the  introduction  of  different  relationships  to  provide  direct  access  to  the  needed 
data.  PSA  provides  a  report  (Element-Process  Utilization)  which  can  be  used  to  de¬ 
termine  how  frequently  each  entity  is  accessed  by  processes  with  the  KEYWORD  "used- 
only-to-access-othe r-data. " 
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Figure  23  -  Example  of  Workload  Detail 
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Figure  24  -  Continuation  of  Workload  Detail 
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PHASE  4.  DESIGN  PHYSICAL  DATA  BASE 

The  output  of  this  phase  is  the  design  for  a  physical  data  structure  which  will 
be  hardware  and  software  dependent  and  which  will  support  all  the  subsystems  at  a 
particular  time.  The  physical  data  structure  may  be  initially  limited  to  the  needs 
of  one  or  two  subsystems  but  will  grow  as  more  subsystems  are  implemented.  Since 
the  functional  requirements  are  based  on  the  logical  data  structures,  existing  sub¬ 
systems  will  be  unaffected  as  new  data  are  added  to  the  physical  data  base.  The 
input  to  this  phase  consists  of  the  logical  data  structure  for  the  system,  the  work¬ 
load  defined  by  the  functional  requirements  for  the  subsystems,  and  hardware  and 
software  specifications.  In  addition  to  satisfying  all  computerized  data  require¬ 
ments,  the  physical  data  structure  must  be  efficient  -  note  that  this  is  the  first 
phase  in  which  efficiency  is  an  issue. 

There  are  six  steps  in  this  phase: 

1)  Simplify  Data  Base  Structure  and  Processes 

2)  Determine  Sizes,  Volumes,  and  Volatilities 

3)  Form  Canonical  Records 

4)  Convert  to  Physical  Level 

5)  Design  Physical  Structure 

6)  Analyze  Results  and  Iterate. 

The  first  two  steps  are  merely  preparation  for  the  use  of  a  computer-aided  data  base 
design  system, which  does  the  computational  work  involved  in  the  third  through 
fifth  steps. 

SIMPLIFY  DATA  BASE  STRUCTURE  AND  PROCESSES 

The  objective  of  this  step  is  to  reduce  the  complexity  of  the  GLDB  and  the  work¬ 
load  enough  to  apply  the  design  system  described  in  steps  3  through  5.  The  com¬ 
plexity  of  the  ICP  System  design  would  be  too  great  to  be  handled  by  the  design  sys¬ 
tem:  hundreds  of  entities  would  consist  of  thousands  of  data  elements,  and  would  be 
retrieved  by  thousands  of  different  processes. 

The  GLDB  structure  can  be  greatly  simplified  by  treating  all  data  subclasses 
and  secondary  keys  as  if  they  were  data  elements  (i.e.,  descriptors  which  cannot  be 
further  subdivided).  The  only  DENs  which  need  appear  in  the  GLDB  structure  are  those 
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which  are  used  as  keys  to  retrieve  or  sort  data.  (Note  that  the  data  base  processes 
were  defined  in  these  terms  in  the  previous  phase.) 

The  data  base  processes  can  be  simplified  by  combining  those  that  represent 
identical  paths  through  the  GLDB  and  eliminating  those  that  occur  relatively  infre¬ 
quently. 

DETERMINE  SIZES,  VOLUMES,  AND  VOLATILITIES 

This  step  is  very  simple:  it  involves  only  the  determination  of  the  size  (in 
characters),  the  number  of  instances,  and  the  volatility  (number  of  creations,  dele¬ 
tions,  and  modifications)  of  each  entity.  The  size  is  merely  the  sum  of  the  sizes 
of  the  data  elements  in  the  entity. 

FORM  CANONICAL  RECORDS 

This  step  is  performed  primarily  by  the  data  base  design  system,  although  there 
can  be  some  interaction  with  the  data  base  designer.  The  design  system  uses  the  GLDB 
structure  to  generate  different  collections  of  "canonical"  records.  A  canonical 
record  is  a  set  of  data  elements  and  relationship  pointers  (symbolic  or  physical) 
which  will  be  physically  represented  by  a  (possibly  segmented)  physical  record.  Each 
collection  of  canonical  records  represents  a  different  way  of  combining  entities  and 
relationships  to  include  the  entire  GLDB.  For  example,  a  repeating  subclass  subordi¬ 
nate  to  a  data  class  could  be  represented  by  a  repeating  group  within  the  data  class 
or  by  a  separate  canonical  record.  If  represented  by  a  separate  canonical  record, 
the  relationship  between  the  data  class  and  the  repeating  subclass  could  be  represen¬ 
ted  by  symbolic  or  physical  pointers,  or  both,  either  from  the  data  class  to  the  re¬ 
peating  subclass,  or  from  the  repeating  subclass  to  the  data  class,  or  both  ways. 

The  design  system  includes  various  heuristics  for  reducing  the  number  of  canonical 
records;  otherwise,  the  number  of  different  combinations  of  canonical  records  would 
be  far  too  great  to  be  manageable.  The  data  base  designer  can  override  the  heuris¬ 
tics  and  add  other  collections  of  canonical  records,  if  desired. 

CONVERT  TO  PHYSICAL  LEVEL 

This  step  involves  the  conversion  of  the  original  workload,  sizes,  -’olumes,  and 
volatilities,  which  were  defined  for  the  GLDB,  to  the  corresponding  parameters  for 
each  collection  of  canonical  records.  To  continue  the  previous  example,  if  the 
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repeating  subclass  becomes  a  repeating  group  in  the  data  class,  then  any  transversal 
of  the  original  relationship  between  the  two  will  vanish  at  this  step,  and  the  size 
and  volatility  of  the  data  class  will  change  to  reflect  the  additional  size  and 
volatility  of  the  repeating  subclass.  The  conversion  is  performed  by  the  design 
system. 

DESIGN  PHYSICAL  STRUCTURE 

At  this  point  the  design  system  requires  parameters  describing  the  hardware  and 
software  environment:  sizes,  speeds,  the  relative  costs  of  retrieval  time,  update 
time,  storage  space,  availability  of  record  segmentation,  etc.  For  each  collection 
of  canonical  records,  the  design  system  determines  a  set  of  physical  record  struc¬ 
tures  and  access  paths  with  near  minimal  cost  and  provides  an  evaluation  of  perfor¬ 
mance.  The  design  system  can  also  evaluate  a  physical  structure  proposed  by  the  data 
base  designer.  Because  heuristics  are  used  to  limit  the  number  of  possible  logical 
and  physical  structures  generated  and  evaluated,  it  is  impossible  to  guarantee  that 
the  design  system  will  produce  an  optimal  structure;  results  to  date,  however,  indi¬ 
cate  that  the  structures  are  close  to  optimal  for  reasonable  situations. 

ANALYZE  RESULTS  AND  ITERATE 

The  final  step  is  the  responsibility  of  the  data  base  designer.  The  hardware, 
cos t ,  and  software  parameters  may  be  changed,  and  new  physical  structures  generated 
and  evaluated.  For  example,  the  data  base  designer  may  be  interested  in  comparing 
the  performance  of  different  data  base  management  systems. 
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PHASE  5.  SIMULATE  DATA  BASE  OPERATION 

The  data  base  design  system  is  intended  to  generate  and  evaluate  a  large  number 
of  reasonable  physical  data  structures,  given  simplifying  assumptions  about  the  work¬ 
load  (e.g.,  that  data  base  processes  are  distributed  evenly  throughout  the  working 
day),  the  hardware  (e.g.,  that  retrieval  time,  update  time,  and  storage  are  the 
scarce  resources),  etc.  These  are  reasonable  assumptions  in  many  cases,  but  they 

must  be  verified  by  more  detailed  analysis.  This  detail  will  be  provided  by  a  data 

8  9 

base  simulator  based  on  the  Extendable  Computer  System  Simulator  (ECSS),  a  pre¬ 
processor  for  SIMSCRIPT  II. 5. ^  The  simulator  is  currently  being  developed  and  veri¬ 
fied  for  ISDNLS  by  the  Federal  Computer  Performance  Evaluation  and  Simulation  Center 
(FEDSIM).  ECSS  provides  special,  pre-coded  models  for  hardware  and  operating  system 
simulation;  the  data  base  simulator  will  add  models  for  the  special  case  of  a  data 
base  and  data  base  management  system  (currently  limited  to  CODASYL).  The  result  will 
provide  a  capability  for  tracing  simulated  retrievals  and  updates  to  determine  peak 
load,  channel  or  memory  contention,  operating  system  performance,  etc.  This  level 
of  detail  would  clearly  not  be  possible  when  large  numbers  of  data  structures  were 
being  evaluated,  but  it  is  reasonable  for  evaluation  of  a  small  number  which  seem  to 
be  close  to  optimal.  Also,  the  detailed  simulation  allows  for  determination  of  the 
sensitivity  of  the  data  base  performance  to  small  changes  in  structure,  workload, 
hardware,  and  software.  The  quick,  approximate  data  base  design  system  and  the  slow, 
detailed  simulator  appear  to  complement  each  other  very  well. 
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PHASE  6.  DESIGN  OPERATIONAL  SUBSYSTEMS 
The  methodology  currently  does  not  cover  this  phase,  but  some  general  remarks 
are  nevertheless  appropriate.  This  final  output  of  the  design  process  should  be  a 
specification  of  the  way  In  which  each  subsystem  will  be  implemented.  If  operational 
specifications  are  produced  without  going  through  the  preceding  phases,  the  choices 
which  can  be  made  by  designers  will  be  reduced,  and  system  costs  will  increase  and 
system  effectiveness  decrease.  Furthermore,  the  large  amount  of  detail  involved  in 
all  the  operational  specifications  may  overwhelm  the  designers,  so  that  many  mistakes 
will  be  made,  particularly  in  the  interfaces.  Finally,  flexibility  will  certainly 
be  reduced  without  the  benefit  of  the  preceding  phases. 

Much  of  the  difficulty  encountered  in  the  ICP  System  redesign  can  be  traced  to 
writers  who  wrote  operational  specifications,  rather  than  functional  requirements, 
in  the  RS's;  more  guidance  and  training  would  have  led  to  a  better  result  in  less 
time  at  lower  cost. 
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CONCLUSIONS 


A  great  deal  of  unnecessary  effort  could  have  been  avoided  In  the  ICP  System 
redesign  by  adhering  to  the  following  guidelines: 

1)  Follow  a  strict  top-down  design  sequence,  as  outlined  In  Phases  0  through  6 

2)  Perform  the  design  phases  with  a  small  number  of  highly  trained  people. 

The  use  of  PSL/PSA  has  proven  successful.  The  data  base  design  system  and  data 

base  simulator  are  expected  to  be  equally  successful. 
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DTNSRDC  ISSUES  THREE  TYPES  OF  REPORTS 

1.  DTNSRDC  REPORTS.  A  FORMAL  SERIES,  CONTAIN  INFORMATION  OF  PERMANENT  TECH 
NICAL  VALUE.  TH|Y  CARRY  A  CONSECUTIVE  NUMERICAL  IDENTIFICATION  REGARDLESS  OF 
THEIR  CLASSIFICATION  OR  THE  ORIGINATING  DEPARTMENT. 

2.  DEPARTMENTAL  REPORTS.  A  SEMI  FORMAL  SERIES.  CONTAIN  INFORMATION  OF  A  PRELIM 

inary.  Temporary.  OR  proprietary  nature  or  of  limited  interest  or  significance. 


3.  TECHNICAL  MEMORANDA,  AN  INFORMAL  SERIES,  CONTAIN  TECHNICAL  DOCUMENTATION 
OF  limited  use  and  interest.  they  are  primarily  WORKING  PAPERS  INTENOED  FOR  IN 
TERNAL  USE.  THEY  CARRY  AN  IDENTIFYING  NUMBER  WHICH  INDICATES  THEIR  TYPE  AND  THE 
NUMERICAL  CODE  OF  THE  ORIGINATING  DEPARTMENT.  ANY  DISTRIBUTION  OUTSIDE  DTNSRDC 
N^wC  BE  APPROVE  DBY  THE  HEAD  OF  THE  ORIGINATING  DEPARTMENT  ON  A  CASE-BY-CASE 
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